home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 22
/
Cream of the Crop 22.iso
/
utility
/
uuexe656.zip
/
UUTECH.TXT
< prev
next >
Wrap
Text File
|
1996-11-01
|
17KB
|
398 lines
This file contains a write-up of the more technical aspects of
uuencoding and uudecoding. First read the file UUSER.TXT, then read
this for more details.
Documentation for UUENCODE/DECODE 96 (v56)
UU-encoding is a way to code a file which may contain any characters
into a standard character set that can be reliably sent over diverse
networks.
THE CHARACTER ENCODING:
The basic scheme is to break groups of 3 eight bit characters (24 bits)
into 4 six bit characters and then add 32 (a space) to each six bit
character which maps it into a readily transmittable character. Another
way of phrasing this is to say that the encoded 6 bit characters are
mapped into the set:
`!"#$%&'()*+,-./012356789:;<=>?@ABC...XYZ[\]^_
for transmission over communications lines.
As some transmission mechanisms compress or remove spaces, spaces are
changed into back-quote characters (a 96). (A better scheme might have
been to use a bias of 33 so a space is not created, but this is not
done.)
A newer, less popular, encoding method, called XX-encoding uses the set:
+-01..89ABC...XYZabc...xyz
In my opinion, XX-encoding is superior to UU-encoding because it uses
more "normal" characters that are less likely to get corrupted. In fact
several of the special characters in the UU set do not get through an
EBCDIC to ASCII translation correctly. Conversely, an advantage of the
UU set is that it does not use lower case characters. Now-a-days both
upper and lower case are sent with no problems; maybe in the
communications dark ages, there was a problem with lower case.
This "UU" encode/decode pair can handle either XX or UU encoding. The
encode program defaults to creating a UU encoded file; but can be run
with a "-x" option to create an XX encoding.
The decode program defaults to autodetect. However the program can get
confused by comment lines preceding the actual encoded data. The decode
mode can be forced to UU or XX with the "-u" or "-x" parameter.
Another option is for the character mapping table to be inserted at the
front of the file. The format for this is discussed later. A table
is automatically detected and used by this decode program. (A table
will override the "-x" or "-u" parameters.) The encode program can be
run with a "-t" option which tells it to put the table into the encoded
file.
A third encode mapping is the one used by Brad Templeton's ABE program.
This is not handled by these programs as the check and control
information surrounding the actual encoded data is in a different form.
From a theoretical view, this encoding is breaking down 24 bits modulo
64. Note that 64**3 is = 2**24. The result is 24 bits in for 32 bits
out, a 33% size increase. Note that 85**5 > 2**32. Also note that
there are 94 transmittable ASCII characters (from 0x21 through 0x7e).
Thus modulo 85 encoding (the atob encoder) transforms 32 bits to 5 ASCII
chars or 40 bits for a 25% size increase.
The trade off in the modulo 85 encoding is that many communications
systems do not reliably transmit 85 ASCII characters. The tilde, carat,
brackets, and sometimes upper or lower case, may get corrupted.
There are three other popular encoding techniques. One is BinHex used on
Apple Computers. The current version is BinHex 4.0. BinHex uses
another mapping into 64 characters. The first encoded line in a BinHex
file is an encoded structure that contains the file name, size,
checksum, date and time The remaining lines are encoded data.
Another technique that I have seen is BinMail used on Unisys A-series.
The most recent mechanism is incorporated into MIME and is called
Base64 encoding. I support this as a $20 shareware option. It uses
another character mapping of 64 characters with no checksums and no
special "end" indicator.
COMPOSING A LINE OF ENCODED CHARACTERS:
A small number of eight bit characters are encoded into a single line
and a count is put at the start of the line. (Most lines in an encoded
file have 60 encoded characters (45 original bytes). When you look at
a UU-encoded file note that most lines start with the letter "M". "M"
is decimal 77 in ASCII which, minus the 32 bias, is 45.)
BinHex does not use a count character, every encoded line contains 64
characters. Except the last is limited by the file size.
Base64, like BinHex, does not use a count character. An encoded line
can be any length (modulo 4). The last line also is module 4 length
but the "=" character indicates no character so the actual file length
can be recreated.
This encode program optionally puts a check character at the end of each
line. The check is the sum of all the encoded characters, before adding
the mapping, modulo 64.
Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it
uses the sum of the original, not the encoded characters. This decode
program accepts either form of line check character.
In previous versions of my package (4.13 and lower) the line check
characters was generated by default by the encode program and was
suppressed with the "-L" option. One reason to suppress them is if
the decode will be done by one of the old Horton decoders. Most modern
decoders either accept this form of line level check or will simply stop
looking after the line length is exhausted. My feelings are mixed about
the current need for line checksums because errors of this type
essentially never occur.
Given modern, error-free communications systems and the CRC checks on
the entire file (see below) I have made the default for uuencoding to
have NO line level check characters effective version 4.21. The "-L"
option on uuencode turns on generation of line checksums. If you have a
problem communications system and want to isolate the trouble, turn them
on.
Uudecode automatically checks for the presence of line checksums; so the
default for uudecode is to leave line level checks on. If there are some
problems the "-L" option for uudecode turns them off; sometimes there
is junk at the end of the line which causes spurious line checksum
errors.
I have encountered various other ways that encoders end lines. One
encoder put an "M" at both the start and end of the line. Another used
a line count character. This decode program checks all of these. I
would not be surprised if some encoder out there ends lines with
sequential astrological symbols. If you encounter some other weird form
of encoded file, let me know. (The -L option turns line level checking
off.)
PACKAGING THE LINES INTO FILES:
The lines of encoded data can be preceded by comments and by network
addressing information. The encoded data is directly preceded by a line
containing:
begin <file-mode> <file-name>
This line is created by the encode program. The decode program scans
the file looking for "begin" in column 1. If following line is the
encoded data, the decoding process begins.
Some encoders put file time and date information on the begin line:
begin <file-mode> <file-name> <date> <time>
My UUdecoder will accept this form of begin line, but does not use the
time and date information.
The final end of encoded data is an encoded line with zero encoded
characters (a back-quote), followed by a line containing "end".
For integrity checking, some encode programs insert checksums for the
entire file. This decode tries to check for all known types of file
checksums. This is discussed in more detail below.
This encode program puts a header line, containing the section number
and file name, in front of every section:
"section <number> of <max sections> of file <file name>"
At the end of a section the encode program inserts a line containing
checksum and file size information. This can be suppressed with the
"-c" option.
Other encoders use a variety of section lines:
The format of the Archive-name line is:
"Archive-name: <name>/part<number>"
for example:
Archive-name: diskutl/part02
The format of the part line is:
<name> part<number>/<max-number>
for example:
diskutl part02/03
WinCode uses:
[ section: 1/2 file: diskutl.exe . . . .
enuu uses:
section 001/002 diskutl.exe . . . .
This program checks for consistency of these names and numbers as of
release 5.0. The problem is distinguishing random text from valid
lines.
For each line that uudecode thinks is a "section" line, tests are made
to validate the current section number, the maximum section number and
the file name. The program is conservative and may sometimes
erroneously give an "invalid section line" type of error. Inspect the
file; if uudecode made a mistake, edit or delete the indicated line; and
continue. If the problem appears to be a uudecode problem, not just
some random comment lines that caused a one time problem, please contact
me.
All the "integrity fields" (the checksum, the line check, and the
section header line) are inserted in a way that they will be ignored by
other UUDECODE programs that cannot handle them. This decode program
does not require any of these fields; if not present, integrity checking
is not done. This program pair is 100% downward compatible!
FILE NAMES:
See UUSER.TXT for a discussion of file naming conventions.
DECODE and VALID LINES:
The below information is to help you solve infrequent problems when
decoding files. Normally you do not have to be concerned with any of
this stuff.
UUdecode sometimes get confused and thinks header lines are encoded
data. Sometimes this is because the separator line between sections
(the "cut" line) is indistinguishable from valid decodable data. (An
example is the line "---" used as a cut line on several DOS BBS
systems.) You can tell UUdecode that a specific line is a cut line and
not a decodable line with the -Z option:
uudecode -Z "---" myfile
Other times there is not a cut line between file sections or there is
some other problem with the file. If so, edit the file and try again.
When decode encounters a premature end-of-file or some data which is not
decodable, it assumes the end of a file section. Decode is conservative
when it encounters data it cannot decode (better an error than a bad
file).
Usually this undecodable data is valid "trailer" data put at the end of
file for data transmission purposes. However the file may be bad. So
decode continues to scan the file, if decode then encounters a line
which is decodable it assumes the file is bad and displays an error
message.
When decode encounters a valid end of file section it must get the next
file in sequence. If the file name ends with a number, decode tries the
next file name in numeric sequence. Otherwise decode asks for a file
name. If this file does not contain decodable data, decode asks for
another file to try.
If multiple sections are saved in a single file, each section must have
some type of section line for validation. Decode builds a table of
section information so it can go back and reread if sections are not
saved in order.
The "SECTION" line inserted by the UUENCODE program is used for validity
checking only. If not present, decode will accept any file containing
encoded lines.
OTHER FILE FORMS:
Sometimes files are wrapped in shell archives that automatically check
sequencing and call uudecode for you on Unix systems. If you prefer to
download the raw files to MS-DOS, uudecode 5.33 will filter simple shell
scripts, that use the Unix 'sed' command, and decode the data
automatically.
There is one more rarely used feature of ENCODE: many input files can be
encoded into one large encode file. The end of an input file is a zero
length encoded line, followed by another "begin" line instead of by an
"end" line. This decode program will decode this sort of file; but the
encode will only handle a single input file. This mechanism is rarely
used with UUencoded files but is more common with Base64 encoded
messages.
FILE LEVEL CHECKSUMS:
There are three types of file checksums found in encoded files:
UUENCODE 2.14 and below inserted lines that gave the section
size and the original input file size. This is supplanted
by a better technique in 3.07; but 3.07 UUDECODE still checks
and validates the old form
UUENCODE 3.07 and Rahul Dhesi's encode scripts compute a Unix
"sum -r" on the encoded sections and on the original input file.
A difference is that UUENCODE 3.07 puts the expected "sum -r"
and size at the end of a file while Rahul''s scripts put them at
beginning. This UUDECODE analyzes either.
The third form of checksum is a full 32 bit CRC that Rahul's
script inserts. My code does not handle this. Rahul has written
the BRIK program to check them. If there is a "sum -r" failure,
BRIK analysis should be considered.
Several encoders put in a line containing just the original file size.
My uudecode checks these.
TABLE LINES:
Some encoded files but the mapping used at the front of the encoded
file, just in front of the "begin" line. The format for this is:
table
first 32 characters
second 32 characters
All this starts in column 1.
If decode encounters a table specification, it uses it and overrides any
command line parameters. Encode will create the table lines if run with
the "-t" parameter.
COMPLETION CODES:
On successful completion, UUDECODE sets ERRORLEVEL to 0. If there are
any problems, ERRORLEVEL is set to non-zero.
The purpose of "-e" is to automatically run an un-archiver (like PKZIP
or ARJ) when UUDECODE successfully completes. If the "-e" option is
given, UUDECODE calls BAT file UNARCUUE on successful completion;
UNARCUUE is passed five parameters:
the filename decoded (with path but no extension),
the file extension,
the input file name (with path but no extension),
the input file extension that is used,
the number of sections.
Normally the file extension tells which un-archiver to call. The
UNARCUUE BAT file, can test these parameters and call the necessary
un-archiver. If UNARCUUE is called, the return code from UUDECODE is
the return code passed back from UNARCUUE. Note: one user had a problem
in that the routines called by UNARCUUE set the errorlevel to 1 which
was passed back to be the return code from UUDECODE.
The "-E" (upper case) option is like "-e" but you supply the name of
the BAT file to execute.
If you are running an automated system, the -y or -n options which force
a "Y" or "N" response to messges, may be helpful.
Both UUencode and UUdecode can accept command line input from the command
line or from a response file. A repsonse file is specified on the command
line by preceeding a file name with "@". The default response file
extension is ".rsp". Thus
UUDECODE @uuopts @uufile.xyz out.fil
Will first get command line options from the file uupots.rsp, then will
get the input file name from file uufile.xyz, and will put the decoded
result into file out.fil.
BUGS and PROBLEMS:
I try to make this program as good as possible. If you find a problem,
please send me a diskette. You can mail a 3 1/2" diskette in a regular
envelope, with no special protection, with a single stamp.
CONCLUSION:
This works well for me. On UNIX I find something I want encoded in 3
sections:
PRG1, PRG2, PRG3.
I copy the three files down to my PC as PRG1.UUE, PRG2.UUE, and PRG3.UUE.
I then just enter UUDECODE PRG and the thing decodes.
Suggestions appreciated. The programs are written in Turbo Pascal with
about 5% TASM for speed. The source is not public domain.
The program is freeware for personnal, private use. If used in any
other way, please contact me - there is a modest charge.
The feature of MIME Base64 decoding is available as a $20 option.
Richard Marks (AA3NO)
931 Sulgrave Lane
Bryn Mawr, PA 19010
Copyright (c) Richard E. Marks, Bryn Mawr, PA,
1992, 1993, 1994, 1995, 1996
Also Copyright program name of
UUENCODE 95, UUDECODE 95, UUCODE 95, UUWIN 95,
UUENCODE 96, UUDECODE 96, UUCODE 96, UUWIN 96,
UUENCODE 97, UUDECODE 97, UUCODE 97, UUWIN 97,
UUENCODE 98, UUDECODE 98, UUCODE 98, UUWIN 98,
UUENCODE 99, UUDECODE 99, UUCODE 99, UUWIN 99,
and any other variations of the above specific names followed by
a year (e.g. UUENCODE 2000)
Copyright (c) by Richard E. Marks, Bryn Mawr, PA 1995, 1996